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Abstract 

Background: Phlebotomus papatasi is a natural vector of Leishmania major, which causes cutaneous leishmaniasis 
in many countries. Simple sequence repeats (SSRs), or microsatellites, are common in eukaryotic genomes and are 
short, repeated nucleotide sequence elements arrayed in tandem and flanked by non-repetitive regions. The 
enrichment methods used previously for finding new microsatellite loci in sand flies remain laborious and time 
consuming; in silico mining, which includes retrieval and screening of microsatellites from large amounts of 
sequence data from sequence data bases using microsatellite search tools can yield many new candidate markers. 

Results: Simple sequence repeats (SSRs) were characterized in P. papatasi expressed sequence tags (ESTs) derived 
from a public database, National Center for Biotechnology Information (NCBI). A total of 42,784 sequences were 
mined, and 1,499 SSRs were identified with a frequency of 3.5% and an average density of 15.55 kb per SSR. 
Dinucleotide motifs were the most common SSRs, accounting for 67% followed by tri-, tetra-, and penta-nucleotide 
repeats, accounting for 31.1%, 1.5%, and 0.1%, respectively. The length of microsatellites varied from 5 to 16 
repeats. Dinucleotide types; AG and CT have the highest frequency. Dinucleotide SSR-ESTs are relatively biased 
toward an excess of (AX)n repeats and a low GC base content. Forty primer pairs were designed based on motif 
lengths for further experimental validation. 

Conclusion: The first large-scale survey of SSRs derived from P. papatasi is presented; dinucleotide SSRs identified 
are more frequent than other types. EST data mining is an effective strategy to identify functional microsatellites in 
P. papatasi. 
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Background 

The sand fly Phlebotomus (Phlebotomus) papatasi (Sco- 
poli) is a natural vector of Leishmania major (Yakimov 
& Schokov), which is the causative agent of zoonotic 
cutaneous leishmaniasis in the Middle East and other 
countries [1,2]. Simple sequence repeats (SSRs) or 
microsatellites, are common components of eukaryotic 
genomes and are short, repeated nucleotide sequence 
elements arrayed in tandem and flanked by non-repeti- 
tive regions [3,4]. SSRs often harbour high levels of 
polymorphism, in terms of repeat number, and have 
been developed into one of the most common classes of 
genetic markers due to their high degree of ubiquity. 
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co-dominance and variability in number among indivi- 
duals. In recent years, microsatellites were extensively 
used to investigate genetic variability and the population 
structures of a wide range of organisms, including para- 
sites and vectors of infectious diseases [5-13]. In the 
absence of genome sequences for sand flies, the isolation 
of microsateUite markers was carried out using various 
enrichment methods [14,15]. This approach has led to 
the development of a panel of five polymorphic and 
informative microsatellite markers for P, papatasi 
[16-18]. 

Parallel to the rapid increase in availability of diverse 
DNA sequence data, which resulted from the huge 
advancement of sequencing techniques, labour-intensive 
methods for the generation of microsatellite markers have 
been replaced gradually by in silico data mining of geno- 
mic and expressed sequence tag (EST) datasets [19-21]. 



o 
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Microsatellites are effectively randomly distributed 
throughout the genome and can represent transcribed ele- 
ments. Although, SSRs derived from transcribed ESTs can 
still maintain allelic variability comparable with that in 
non-transcribed genomic DNA, they can serve as molecu- 
lar markers for numerous applications [22,23]. EST data- 
bases have been a rich source of SSRs for the development 
of "genotyping" applications. Marker development from 
already existing sequence data is rapid, efficient and eco- 
nomical. Any type of SSR will be detected using an appro- 
priate search program, whereas only SSRs with pre- 
defined motifs are captured by enrichment. In addition, 
SSRs are physically linked to expressed genes and thus 
represent functional markers. 

The aims of this study were to expand the genomic 
resources for P. papatasi by analyzing 42,784 ESTs 
available in the GenBank database, increase the number 
of SSR markers by mining a previously developed ESTs, 
and evaluate specifically designed primer pairs for their 
abundance and motif type. 

Results 

Sequence analysis 

The sequence analysis of the whole data set comprised 
ESTs of an average size of 469 bp. Sequence composi- 
tion showed slight bias toward A and T; A+T = 
13,235,131 (56.5%), whereas G+C = 10,149,530 (44.3%). 
The frequency of the main nucleotides (A), (C), (G) and 
(T) were comparable: 28.7, 21.8, 21.5, and 27.8%, 
respectively. 

SSR types, distribution and frequency 

Out of 42,784 ESTs analyzed; 1,499 (3.5%) SSRs were 
characterized. The number of repeats per SSR motif 
ranged from 5 to 16 repeats, with 5-9 being most fre- 
quent. On average, one SSR was found in every 15.55 
kb of ESTs, and the total length of the regions contain- 
ing repeats was 0.079% of the total ESTs size. A total of 
93 ESTs were found to have more than one SSR motifs. 

SSR loci were categorized by repeat type and struc- 
ture; the dinucleotide repeat motifs were most abun- 
dant, accounting for 67% of the whole SSRs 
characterized followed by the trinucleotides (31.1%), tet- 
ranucleotides (1.5%), and pentanucleotides (0.1%), 
(Table 1). No hexanucleotide SSRs were detected in P, 



Table 1 Summary of in silico mining of EST sequences for 
P.papatasi sand fly 



Parameters 


Number (%) 


Total EST sequences 


42,784 


Total number of SSRs identified 


1,499 (3.5%) 


Frequency of SSRs 


One every 15.55 kb 



Numbers in parentheses Is the percentage value of the repeat type. 



papatasi ESTs. Among the dinucleotide motifs, AG/TC 
type was more abundant (37%) than CT/GA (25.3%) 
and AT/TA types (22.2%); few CA/GT (7.1%), AC/TG 
(5.6%) and CG/GC (2.8%) types were characterized (Fig- 
ure 1). For the trinucleotide SSRs, 467 motifs and 29 
motif types were identified for P, papatasi; the TTC 
motif was the most abundant (13%), followed by AAT 
(11%), CAG, CAA (7%) each, AAC, ATC (6%) each, and 
ACA (5%) while the other motifs were at lower frequen- 
cies (Figure 2). Five types of tetranucleotide motifs were 
characterized; AAAT, ATTT, TCTT, AAAG, and 
TTAT, with frequencies of 15, 4, 2, 1, and 1%, respec- 
tively. Two identical pentanucleotide motifs of AATTG 
type (0.1%) were also identified. 

SSR marker development 

Of 1,499 unique ESTs containing SSRs, 630 (42%) were 
suitable for primer design, comprising 425 dinucleotide, 
271 trinucleotide and 9 tetranucleotide SSRs. The 
remaining sequences were inappropriate for primer 
design, mainly because of insufficient DNA sequence 
flanking the microsatellite core, or the sequences them- 
selves not being suitable for primer design. Thus, overall 
SSR primers could be designed to amplify non-redun- 
dant loci from ~ 1.5% of the initial number of ESTs. 
Based on the size of repetitive motifs, a subset of 40 pri- 
mer pairs were selected and designated as prime candi- 
dates to carry out polymorphism analysis using a 
minimum repeat length criterion of 5 repeats. This sub- 
set comprised 27 dinucleotide, 8 trinucleotide and 5 tet- 
ranucleotide (Table 2). 

Discussion 

Molecular markers are central for investigating genetic 
variability and for understanding genome dynamics. In 
the case of sand flies, the development of molecular 
markers, however, has remained slow. Microsatellites or 
SSRs have proven to be useful markers in population 
genetic studies of sand flies [16]. The presence of SSRs 
in coding regions suggests their importance as func- 
tional markers. While the development of microsatellite 
markers for sand flies from genomic libraries has been 
relatively costly, labour intensive and time consuming 
[14-16,18], the mining of microsatellite markers from 
EST data overcomes these disadvantages. 

The ESTs used in the present study were normalized. 
Hence, redundancy in the EST database was minimized 
and a wealth of unique cDNA sequences (unigenes) for 
marker development was found. Examining the distribu- 
tion of SSR motifs can assist in gaining insights into 
genome composition and genetic makeup [24,25]. 
Although, SSR motifs with more than five repeats were 
considered here, shorter SSRs were identified. The maxi- 
mum length achieved was 16 repeats; this is consistent 
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Figure 1 Frequency distribution of different repeat types (2-5 motif units) identified in ESTs from P. papatasi. The numbers on the bars 


indicate the percentage of each repeat type microsatellites in total number. 



with studies that revealed shorter SSRs in Drosophila 
[26,27]. 

Dimeric repeat motifs were more abundant than tri- 
meric repeats. However, this observation was expected, 
as the frequency and distribution of SSRs depend on 
several factors, such as size of dataset, and tools and cri- 
teria used for SSR discovery. Tetra- and penta-repeat 
motifs were considerably less represented. 

The most abundant SSRs were of dinucleotide type 
(Figures 1 and 2), in which homopurine-homopyrimi- 
dine stretches, such as AG and CT, have the highest fre- 
quency. Dinucleotide repeats are typically more frequent 
in noncoding regions [28-30]; however, they occur occa- 
sionally in coding regions as well [31]. Some dinucleo- 
tides, such as (AG)n/(CT)n, are not selectively neutral 
and may have functional roles. These repetitive 
sequences occur in the 5'-UTR and are likely to be 
involved in gene regulation [32-34]. The (GC)n repeats 
were absent from P, papatasi, even though they are 
numerically abundant SSR loci in most eukaryotes 
[35,36]. However, dinucleotide SSR motifs in P. papatasi 
ESTs are relatively biased toward an excess of (AX)n 
repeats and a low GC base content, the broader implica- 
tions of this observation are unclear. 

The high frequency of dinucleotide motifs (AT, AG, 
and CT) could be explained by their abundance in sev- 
eral codons with different nucleotide arrangements. This 
observation is in agreement with previous reports 



[32,37]. EST-derived SSRs; AG/CT repeats have been 
studied widely in eukaryotes, particularly in plants, and 
found to be highly abundant and highly polymorphic 
[38,39]. For P. papatasi, the number of published SSR 
markers is very limited compared with other major 
insect vectors, including species of Anopheles and Aedes 
[14]. In the present study, an in-depth analysis of micro- 
satellites, in terms of density, resulted in the develop- 
ment of a new set of 40 SSR markers (Table 2). Thus, 
we have shown that the mining of ESTs is an effective 
strategy to identify functional microsatellites, with per- 
fect repeats, in P. papatasi. 

The prevalence of trinucleotide SSRs in P. papatasi 
ESTs was expected, since they do not interrupt triplet 
codons, whereas other repetitive stretches, such as mono-, 
di-, or tetra-nucleotides lead to frame-shift mutations, 
which would result in severe adverse effects in coding 
regions. However, the abundance of trinucleotide SSRs in 
coding regions of various organisms was much higher 
than in non-coding regions of the genome [37,40-45]. In 
contrast, the present study showed that trinucleotides 
were the second most abundant SSRs in P. papatasi ESTs 
(31.1%) compared with dinucleotides (67%). This observa- 
tion could be explained by the SSR mining tool used here 
and its preset criteria, such as identification of a minimum 
number of repeats, which could have led to the identifica- 
tion of more repeats. This approach could have led to the 
identification of more dinucleotides and fewer tri- and 
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Figure 2 Frequency distribution of (a) di-, (b) tri-, and (c) tetra-nucleotide repeat motifs of P. papatasi. 



tetra-nucleotides, with this bias contributing to the over- 
representation of dinucleotides compared with tri- and 
tetra-nucleotides. Another possible explanation is that P. 
papatasi EST data do not contain many trinucleotide 
SSRs compared with dinucleotides. 

Conclusions 

This is the first large-scale survey of 1,499 unique EST- 
SSRs of P. papatasi. Despite the number of EST 
sequences surveyed, SSR loci do not appear to be particu- 
larly dense or frequent in P. papatasi (3.5%). SSR repeats 
characterized are mainly of dinucleotide type and 



heterogeneously distributed across all potential base 
compositions, with a small number of GC-rich repeat 
motifs. The DNA replication machinery likely contri- 
butes to the elevated abundance of dinucleotide AT-, and 
AG- rich repeat motifs and to lesser extent trinucleotide 
motifs, suggesting that future screens of P, papatasi and 
other sand fly molecular markers may benefit by focusing 
on SSR motifs. The utility of the microsatellite markers 
characterized in this study should be evaluated in the 
near future. More microsatellite markers should be char- 
acterized for P, papatasi and other key sand flies of 
major importance as vectors of Leishmania, 
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Table 2 Primers designed and suggested to amplify repeat sequence (SSRs) including number of repeats product size 
in bp, forward (Fw-) and reverse (Rv-) primer sequences, and melting temperature (Tm). 



Name 


Accession no. 


SSR 


Product size 


Fw-Primer (5'-3') 


Tm (°C) 


Rv-Primer (5'-3') 


Tm (°C) 


PPESTl 


FY91 RRQ^ 1 

Q 1 Z 1 OO^J. 1 


i.'^a; I D 


1 70 
1 / u 


Ar;TTrrr;rr Ar ATrr ATTr 

AO 1 1 v^v^^v^v^Av^A 1 v^v^A 1 1 


ftf) Q 


TTA r; A r A rrrrr; A A Ar; A Ar; A A A 

1 1 AOAL.AOL.OOOAAAOAAOAAA 


ftfl 4 
DU.^ 


PPEST 2 


fCnl nR^ft9 1 
r\j \ Uojuz. 1 


[yjK^/A) 1 D 


1 41 


THTr A AT Ar;Tr;r;rTr A ATr;rTr 


ftf) ^ 
DU.D 


ATTArTrrTTTATrrTTrrrrr; 

A 1 1 AO 1 OO 1 1 1 A 1 v^v^ 1 1 v^L-L-^O 


ftfl S 
DU.D 


PPEST 3 


CA'H-/ / O.I 


U a; 1 D 


1 88 


TAAi 1 1 1 ATr,rr,r,TrTATrir,r,A 

\^r\r\ 1 1 1 1 A 1 Ov^OO 1 1 A 1 \3\3\3r\ 


61 0 


Arir,TATr,rAAAr,TA ATCcr-iTGr, 

AOO 1 A 1 O^AAAO 1 AA 1 OOO 1 OO 


60 1 


PPF^T A 
rrlij 1 ^ 


Fl^l 1 ftftl R 1 


U 1 D 


1 Q7 
1 b^/ 


A rrr(^ A rr^rr^ A ATTTA riTT 

AL.L.^oAL.V3V^V3AA 1 1 1 r\\^ 1 1 1 


ftfl R 
DU.O 


r; r; A r A r A r A A (TTT A Tr r r r Tr A 

OOAOAOAL-AAO 1 1 A 1 OOOO 1 v^A 


ftfl 4 
DU.^ 


rrlij 1 J 


fCnl n7^7ft 1 
r\j 1 u/ 3/ u. 1 


V 1 ov^^ 1 D 


1 Qfl 
1 ^u 


r; A r; A r; A r ATr;r;Tr;r; ATnr; A riT 

OAOAOAL-A 1 OO 1 \J\3r\ \ \J\3r\\^ \ \ 


ftfl 4 


Tr;Tr a ata rTrrrTr a ATrrTr 

1 O 1 OAA 1 AO 1 OOL. 1 v^AA 1 O*^ 1 


ftfl ^ 
DU.D 


PPF^T ft 
rr Qj 1 D 


Fm nRn7R i 


1 ; 1 z 


9Q1 


A A ATrr A rT ATrrTrrTTTrrTr 

r\r\r\ \ \J^r\\^ \ A 1 v^v^ 1 v^v^ 1 1 1 1 ^ 


D^.U 


1 1 1 1 (^(^(^(^TA ArATrrrr; 

1 1 1 1 OOOO 1 AAOA 1 OOOO 


^R 9 
DO.Z 


PPEST 7 


FY47^^ft1 1 


i,L.A; 1 z 


94^ 
Z^D 


r;T A rrriTTrrrTrrrT ATHTr 

O 1 AL.^^ 1 1 1 L.^^ 1 L.^^ 1 A 1 O 1 


ftfl 1 
DU. 1 


nnrrnT Acr AAC ATrcTcr 

OOO 1 1 L-Av^L-AAv^A 1 v^v^ 1 


ftfl 9 

DU.Z 


PPEST 8 


F(^1 1 Q94R 1 


\S~-t\) 1 z 


1 40 


\J^r\\^ 1 V3 1 r\r\\^ \ \ VJAOVJAOVJAVJO 


60 2 


A r A rTTr ATf, A rTrr (^TrTrTr, 

AOAv^ 1 1 OA 1 OAO 1 Ov^O 1 1 1 O 


59 7 


PPEST 9 


F(^i 1 7fti n 1 

\\J \ 1 / U 1 U. 1 


(,l.a; 1 z 


1 75 


rr;rArAAr;AArAAAriTr;r;AAA 


61 2 


TrTTrTrr,rTrrrTrr,TTr 

1^1 1^1 ^O*^ 1 v^v^^ 1 v^O 1 1 


60 6 


PPEST 1 0 


F(^1 1 7^71 1 

\\J \ \ / D / I.I 


rrm 9 

1, 1 1 z 


236 


A^ 1 \3r\r\ \ \^ \ \ \^ \ \3\^ 1 1 1 1 v^^A 1 1 


61 4 


T A A rrr; A Arrrrrrr, A A r 

1 AAOOOAAOOOOv_OOAA»^ 


62 2 


PPF^T 1 1 
rrlij 1 1 1 


F<^^zl7QRft 1 


\S^r\) 1 1 


1 ft9 
1 DZ 


LdLd 1 LdLdA 1 r\\^ 1 1 o 1 \3r\\j<3r\\^ 1 oA 


ftfl n 

DU.U 


rrATTrA A AfTA A Arrrr A A Arr 

L.L.AL. 1 v^AAAL. 1 AAAL. 1 OOAAAO*^ 


4 


PPEST 1 2 


r\j 1 1 J jj 1 . 1 


VA 1 ^ 1 U 


9^^ 
ZDD 


r 1 1 1 1 rTr;rrTTAr;rTr;rr;TT 

1 1 1 1^1 ov^v^ 1 1 Aov^ 1 1 1 


ftl n 

D 1 .U 


rr-Tr-TrTrTTrr act act AC A A 

L-O 1 O 1 L. 1 ^ 1 1 ^^Av^L-AL. 1 AL.AA 


ftfl 9 
DU.Z 


PPEST 1 3 


FY9nft^R9 1 

C 1 ZUUOOZ. 1 


rrm n 


222 


A (^rTfTif:; A ATr A (^r, A r,r A A AT 

r\\3\^ \ \J\JrAr\ 1 '^AvjvjAvjv_AAA 1 


58 9 


r A r;T ATr A A r;rr; A A A nrrr, 

^AO 1 A 1 v^AAOv^OAAAOv^^O 


59 6 


PPEST 14 


riXO 1 DUD / . 1 




228 


Arfnifni ir-,1 1 1 1 rT(^Tr,r,Ar,T(^ 

A^O IVJl IVJl 1 1 l»^l>Ol VJVJAVJ 1 >o 


60 1 


rTnnr-TA 1 1 1 1 rTr,rrTTr,ATT 

^ 1 OOO 1 A 1 1 1 \ \ \Jk^\^ 1 1 OA 1 1 


59 5 


PPEST 1 5 


FKR1 901 ^ 1 
rixo 1 zu 1 D. 1 


(TGA)9 


1 57 


A A r; A A A r,r,TTTr,rirTTrr,Tr,T 

AAOAAAOO 1 1 1 1 1 1 O 1 


60 2 


A ATr,r,Tr,rTTr ATCTrrTCTTr 

AA 1 OO 1 O*^ 1 1 K^r\ 1 1 v^v^ 1 1 1 


59 7 


PPF^T 1 ft 
r r CD 1 1 D 


ri\o 1 1 uoD. 1 


rrrA'iQ 


9^Q 
ZDy 


TTrTl^TTr AC AC ATC ATTTrrr 

1 1 1 O 1 1 v^AL-Av^A 1 v^A 1 1 1 v^L.^ 


D^.o 


Tr,Tr r, rTr,T A A TTTr A TTr, r A r, 

1 O 1 OOv_ 1 O 1 AA 1 1 1 OA^ 1 OOAO 


ftfl 9 
DU.Z 


PPEST 1 7 


P(^i 1 ft7i 9 1 
ro 1 ID/ 1 z. 1 


V 1 


90^ 
zuo 


rTr;TTr Ar;r A A A Arr; Ar; Arr; 

1 O 1 1 v^AOV^AAAAv^VJAOAv^O 


ft 

D^.D 


TrrrAAr;TArAAAr,Arr,r;AArT 

1 ^^»^AAO 1 A»^AAAOA»^OOAA»^ 1 


ftfl D 
DU.U 


PPEST 1 8 


FY91 ftl 9^ 1 

Q 1 Z 1 U 1 ZD. 1 


frTiQ 

V'^ 1 ]^ 


97ft 

Z/D 


A rTTr;r at a rTrTirrr; r a r a a 

AL. 1 1 Ov^A 1 r\\^ 1 1 1 1 L.VJL.AL.AA 


Q 

Dy.y 


A A ATTr ATr r A A A Ar rTr rrTr 

AAA 1 1 \^r\ 1 OOAAAAv^v^ 1 1 L. 


ftfl ^ 
DU.D 


PPEST 1 9 


FY91 n7Qft 1 

Q 1 Z 1 U/ ^D. 1 




21 1 


A rrr ATr A rrr,TrTrTr;r 


ft 

D^.D 


TTTrrrTTrAArAArAArrAr 

1 1 1 L.*^^ 1 1 OAAv_AAv_AAv_v_Av_ 


Q 

D^.y 


PPEST 20 


FY91 n9RR 1 

C T Z 1 UZOO. 1 


rroQ 


293 


A r A A A A A A rr ATr r ATTTr,r 

A^AOAAOAAAv^v^A 1 ^^A 1 1 1 


604 


rrATATTrrrr ATTr Ar Ar Ar A 

^^A 1 A 1 1 v^v^v^OA 1 1 OAOAOAOA 


604 


PPF<^T 91 


LA^/^U/^. 1 


I'Al^'lQ 
l^AUj^ 


D/|/| 

z^^ 


CnCn ATT A (^T(^T(^(^rTr A ArnATCnC-, 
\3\3r\ 1 1 AO 1 o 1 Oov^ 1 L.AAOA 1 oVj 


ftfl Q 


rr Arr A AA ATArr A A A Arrr AT 

Ov^AOOAAAA 1 AOv^AAAAOOOA 1 


ftfl R 
DU.O 


PPEST 22 


F^^471 7n 1 

CDD^/ 1 / U. 1 


1 ;y 


99Q 
zzy 


AT(^r;r;r;TATTA Ar^r^r; Ar; a ATr^r 

A 1 VJVJVJVJ 1 A 1 1 AAVJVJOAOAA 1 vjv_ 


ftfl 4 
DU.^ 


rrr ArrTrTrTrArTrAr ATAr 

OOOAv^O 1 O 1 O 1 OAO 1 OAOA 1 AO 


ft 

D^.D 


PPF^T 9^ 
rr CD 1 ZD 


Fr;i 9071 n i 

ro 1 zu/ 1 u. 1 


fHA'iR 

VOA^O 


99^ 

ZZD 


rT(^r ATTTrTA ATTTrr;rr;r, 

1 \J^r\ 1 1 1 »^ 1 AA 1 1 1 v^O^vjvj 


ftfl 7 
DU./ 


A rzr; A A rTr r A r A rTr A A A A r 

OOAOOAAO 1 OOAv^AO 1 OAAAA*^ 


ftf) n 

DU.U 


PPEST 24 


ro 1 1 ou^u. 1 


V 1 Wo 


1 9Q 
1 zy 


Ar-,r-,rAr a 1 1 1 1 r;r;TTr;TrTTrT 

AOoL-Av^A 1 1 1 1 OO 1 1 O 1 L. 1 1^1 


ftfl n 

DU.U 


ATTTArrr ArTrAATArrrrAr 

A 1 1 1 AOOOAO 1 v^AA 1 AO^^O^^AO 


R 

D^.O 


PPEST 25 


FY91 Rft7R 1 

C T Z 1 OU/ O. 1 


(TTTA)8 


1 32 


rr ATTr ArTTr A A ATrr ATrrT 

^v^A 1 1 ^Av^ 1 1 v^AAA 1 v_»^A 1 v^v^ 1 


60 2 


A ArTrrrTrrTTrc-, 1 1 c-, 1 1 1 1 

AA»^ 1 OOO 1 OO 1 1 OO 1 1 O 1 1 1 1 


60 5 


PPEST 26 


FY91 ft^l ^ 1 

C 1 Z 1 UD 1 D. 1 




217 


riATTrrrrAr,r,rAAAATAAA 

OA 1 1 ^v^v^^AOOv^AAAA 1 AAA 


58 9 


TA ATr A ATATr rTr rrTTrrr 

1 AA 1 v^AA 1 A 1 OO 1 OOO 1 1 ^^O 


59 5 


PPF<^T 97 
rrCj 1 Z/ 


FY91 ^9RZL 1 

L 1 Z 1 DZO^. 1 


iTA A A'lR 

1 AAAjO 


1 DU 


TTrrTA A Ar ATA ArrrrA ATT 
1 1 1 AAAOAL-AAOv^OL-AAL. 1 


ftfl 9 
DU.Z 


r r ATTr a rTTr a a ATrr ATrrT 

L.L.A 1 1 v^AL. 1 1 L.AAA 1 L.v^A 1 v^L. 1 


ftfl 9 
DU.Z 


PPEST 28 


FY9nQR7n 1 
C 1 Z-KJyO / U. 1 


frAA'iR 

^^v^AA^O 


94^ 
Z^D 


r, ATr A ACCrCCTTA ATTTr A AC-, 

OA 1 v^AAOO^OO 1 1 AA 1 1 1 v^AAO 


ftf) n 

DU.U 


Ar A ATrr Ar A Arr Arr ATrr 

A»^AA 1 v_v_AOAAOOAv_OA 1 Ov_ 


ftf) 1 

DU. 1 


PPEST 29 


pi/QI Aft^ft 1 
rixo 1 ^uDu. 1 


fr;r;Tift 
1 ]\j 


99R 
zzo 


TTT(^Tr;r; A rzTTrr, ATr; A TT Arr 

1 1 1 O 1 OOAO 1 1 v^OA 1 OAL. 1 Av^O 


ftfl 9 
DU.Z 


r r A r A r ATTr rTrTTr r A ATTr 

OOA»^A»^A 1 1 1 O 1 1 v^^r\r\ 1 1 »^ 


ftf) ft 
DU.D 


PPEST 30 


FY91 Rft7S 1 

C T Z 1 OU / D. 1 




373 


TTrrrTrTTTrTrTrTrTrTrr 

1 1 ^Ov^ \ 1 |v^|v^|v^|v^l^l v^v^ 


59 5 


ATTrTrTArrTTArrTrrrrTr 

A 1 1 1 O 1 Av^O 1 1 Av^v^ 1 O^v^v^ 1 O 


604 


DDCCT Q 1 

rrbb 1 D \ 


LYz 1 Do/z. 1 


( 1 Aj 1 1 


A C\C\ 

4UU 


A A ATf — Ff^r ATTf — Ff — FrTf — TA AT 

AAALb 1 bLA 1 1 L 1 L 1 ULL 1 AA 1 


oU.z 


f — rrr' ata i i i a i i i rrrrr ( — r 
L 1 LbA 1 A 1 1 1 A 1 1 1 LLLLbL 1 


Dy.D 


PPEST 32 


EY21 0648.1 


(^A)6 


123 


ATOAOTCAAACCCCTCCm 


60.2 


AACTGGGTGGTOG 1 IGI 1 1 1 


60.5 


PPEST 33 


FGl 15100.1 


(GAA)15 


251 


ATACTCCCTCAGAACTAGCCCC 


60.0 


TOGTOTOTOTOTCCTCC 


59.1 


PPEST 34 


EY21 5687.1 


(AAAG)5 


137 


CACCTACAGAGATGCTGGATTG 


59.8 


GGGCTAAAATGTGTOTGACTTG 


60.0 


PPEST 35 


EY21 4242.1 


(GT)15 


245 


TAGTCACAACACACGAACCACA 


60.1 


™accgtgagagtaccagcaa 


59.8 


PPEST 36 


EY203279.1 


(mA)6 


123 


ATOACTOAAACCCCTCCm 


60.2 


AACTGGGTGGTOG 1 IGI 1 1 1 


60.5 


PPEST 37 


EX474 189.1 


(AT)7 


208 


ACCGTGCAACCA^AAGTC 


60.3 


ag™tototc™ctgcgcc 


59.5 


PPEST 38 


FK81 1878.1 


(AG)5 


256 


TCCAGATACTCAAGTOCAGCC 


60.6 


TATAGCGTOAGATCCACCAGA 


59.7 


PPEST 39 


FGl 07375.1 


CT6 


292 


CCCCAAAGAGAGTACACCAAAG 


60.0 


ATCAGCCAGTGTCGTATGAATG 


60.0 


PPEST 40 


FGl 14532.1 


(AG)5 


322 


TCCCAAGGCTATOAGTCTGGT 


59.2 


GGCTATCGTGCAA^CTOT 


59.8 



Methods 

Retrieval of EST sequences 

P. papatasi EST sequences used were directly retrieved 
from NCBI database http://www.ncbi.nlm.nih.gov/pro- 
jects/dbEST/ on May 10, 2011. A total of 42,784 P, 
papatasi ESTs were listed and annotated. These ESTs 
were derived from three cDNA libraries constructed 
from uninfected sugar fed, uninfected blood fed, and L. 
major infected blood fed P. papatasi sand flies. All the 



sequences were saved in FASTA-formatted text files 
that were used for further analysis. 

Characterization of SSRs 

PolyA and polyT tracts were removed, leaving no (T)io 
or (A)io in any 10 bp window at either end of the 
sequences. The dataset was divided into small files, each 
containing 100 FASTA formatted sequences. SSR-con- 
taining sequences were identified using SSRIT web 
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based SSR identification tool [46] available at http:// 
www.gramene.org/db/markers/ssrtooL Any sequence 
was considered as an SSR where a repeat motif of one 
to six nucleotides in length was repeated at least five 
times for dinucleotide, trinucleotide, tetranucleotide and 
pentanucleotide SSRs. Redundant sequences were fil- 
tered by BLAST analysis, using each individual sequence 
as a query against the total set of selected sequences. 
Homologous sequences were aligned using MEGA 5 
and scanned manually in the sequence editor window 
[47] . The criteria for redundancy were: (i) where a clus- 
ter contained two or more identical sequences, the long- 
est was retained; (ii) sequences which were composed 
entirely of SSR motif, lacking any flanking sequence, 
were discarded since their uniqueness could not be 
established and in any event, primer design was not 
possible. 

Sequence analysis 

Total number of characters, sequence composition fre- 
quency and A+T and G+C contents were carried out by 
CLC Genomics Workbench program, v.3.7 (CLC bio, 
Denmark). The EST sequences were screened for the 
presence of perfect SSRs, and repeat motifs > 5, these 
sequences were selected, annotated and filed for primer 
design. 

Primer design 

The non-redundant EST-SSRs were used for primer 
design to flanking sequences using PRIMERS [48]. PRI- 
MERS was calibrated to the following parameters: (i) 
Primer length from 18-27 bases, the optimal annealing 
temperature (Tm) from 55 to 60°C, the target amplicon 
size 100-SOO bp, and GC content between SO and 70% 
(50% as the optimum). All other parameters were set to 
default values. The output from PRIMERS was further 
analyzed in order to lessen the chance of encompassing 
tandem repeats in primer sequences and self- and pair- 
complementarity. 
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